Introduction

Data Focus

I have chosen the day the email was sent, the hour the email was sent, and the sentiment in the body. These options also keep anonymity and maintain privacy.

The aim is to compare the sentiments of the emails by the day, by the hour, and to see if there is some relationship or trend between these two time scales and the sentiment of the emails.

Chart Types

I have tried out two (2) bar graphs as that is my usual graph type of choice, however I have also made two (2) scatter plots which is much more useful for understanding information.

We can compare the graphs side by side to see the differences in graph types and potentially make observations that each graph type may have difficulty conveying.

The “grammar of graphics”“

I have used scales with “scale_x_continuous()” for the hour related graphs to more easily read which hour is which on the graphs. I have used geoms with “geom_col” and “geom_point” functions to make the bar graphs and scatter plots. I have used “aes” for mapping and colours alongside “labs” for labelling axis and giving titles. “coord-cartesian” was used to properly set limits for the relevant x and y values. For the bar graph, I used “geom_text” to add the actual mean sentiment value rounded to 3 decimal points for accuracy.

Things that didn’t work

“geom_text” on the scatter plot made the graph nearly illegible. The numbers were too scrunched up together which made the number themselves nearly useless. It was impractical to apply them to the scatter plot. The bar graphs, which do convey information, were not as useful as I had hoped, so I had made scatter plots. I had also at first used the means of the sentiment_body score (the data I used in the bar graph) in the scatter plots which was not what I wanted and was rather redundant. I had also added the “day” column variable in the hour_day_sentiment_data as without it, the graph did not recognise what the “day” variable was without it. This was a different data frame from the ones used in the plots which I used first.

Visualisations

Day Scatter Plot

This graph will help us see any trends among the weekdays. Perhaps certain days affect sentiment scores?

## Rows: 514 Columns: 12
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (7): email_from, email_to, email_subject, email_body, email_date, email_...
## dbl (5): email_year, email_hour, email_minute, sentiment_subject, sentiment_...
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
## Warning: Returning more (or less) than 1 row per `summarise()` group was deprecated in
## dplyr 1.1.0.
## ℹ Please use `reframe()` instead.
## ℹ When switching from `summarise()` to `reframe()`, remember that `reframe()`
##   always returns an ungrouped data frame and adjust accordingly.
## `summarise()` has grouped output by 'day'. You can override using the `.groups`
## argument.

Hour Scatter Plot

This graph will help us see any trends across different hours. Perhaps different hours affects the sentiment score of emails?

## Rows: 514 Columns: 12
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (7): email_from, email_to, email_subject, email_body, email_date, email_...
## dbl (5): email_year, email_hour, email_minute, sentiment_subject, sentiment_...
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
## Warning: Returning more (or less) than 1 row per `summarise()` group was deprecated in
## dplyr 1.1.0.
## ℹ Please use `reframe()` instead.
## ℹ When switching from `summarise()` to `reframe()`, remember that `reframe()`
##   always returns an ungrouped data frame and adjust accordingly.
## `summarise()` has grouped output by 'hour'. You can override using the
## `.groups` argument.
## Coordinate system already present. Adding new coordinate system, which will
## replace the existing one.

Dynamic Scatter Plot

This graph will allow us to see if there exists some relationship between the groupings of sentiment scores of email bodies in one of the seven days. Perhaps monday emails will be less sentimental in the morning hours compared to friday emails? This is this report’s defining graph comparing the relationship of emails and the hour of the days relative to the days themselves.

## Rows: 514 Columns: 12
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (7): email_from, email_to, email_subject, email_body, email_date, email_...
## dbl (5): email_year, email_hour, email_minute, sentiment_subject, sentiment_...
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
## Coordinate system already present. Adding new coordinate system, which will
## replace the existing one.

Learning Reflection

An Important Idea

One key idea I have learnt from module 4 is the complexity of the relationship of data and the representation of this data in dynamic graphics. I was aware of the existence of these visualisations but this module has taught me the intricacies of creating these graphics and the limitations and steps that come with them. I have learnt the nature of data and how the different forms of data can be used in conjunction with each other and what is and is not possible and what is and is not appropriate when considering the visualisation.

Creativity

The demonstration of creativity is in the following

## Rows: 514 Columns: 12
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (7): email_from, email_to, email_subject, email_body, email_date, email_...
## dbl (5): email_year, email_hour, email_minute, sentiment_subject, sentiment_...
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
## # A tibble: 234 × 6
##    date                 year month day        hour sentiment_body
##    <chr>               <dbl> <chr> <chr>     <dbl>          <dbl>
##  1 10/05/2023 17:29:55  2023 May   Wednesday    17          0.518
##  2 10/05/2023 10:58:04  2023 May   Wednesday    10          0.846
##  3 10/05/2023 10:24:30  2023 May   Wednesday    10          0.700
##  4 10/05/2023 09:32:26  2023 May   Wednesday     9          0.652
##  5 10/05/2023 04:50:26  2023 May   Wednesday     4          0.766
##  6 09/05/2023 20:16:17  2023 May   Tuesday      20          0.683
##  7 09/05/2023 18:18:23  2023 May   Tuesday      18          0.587
##  8 09/05/2023 16:00:08  2023 May   Tuesday      16          0.643
##  9 09/05/2023 14:08:00  2023 May   Tuesday      14          0.514
## 10 09/05/2023 13:43:51  2023 May   Tuesday      13          0.717
## # … with 224 more rows
## Coordinate system already present. Adding new coordinate system, which will
## replace the existing one.

This grid allows us to compare and contrast the bar graph and the scatter plot. This demonstrates creativity by utilising the image manipulation functions to create an image useful for comparing graph types. I will further demonstrate creativity by pointing out that the bar graph seems to point out that the third hour (03:00) is when I get on average the emails with the lowest sentiment scores. The scatter plot does not convey this large difference as the bar graph does. Perhaps that is when my enemies are awake.

Data Technology Curiosities for the Future

I have said in my previous project that I wished to see 3D models. I think we may be nearing that point in the course and I am curious at how we would achieve this using R. I am taking a 373 CS course which discusses computer graphics (something tangentially related) and I am interested with how data is draw on the screen using computer science techniques. I hope to see and learn the techniques of how to convert data into a 3d visualisation.

Appendix

library(tidyverse)
library(ggplot2)
library(gganimate)
library(dplyr)
library(magick)

# Get CSV file
data_csv_file <- "https://docs.google.com/spreadsheets/d/e/2PACX-1vRbfqamg_dxAXGj1NK3prbq6B1wleNQ2qLnTElGo4XQ02nnZq3ZAkupX0z2CbZ1KFwGflEyD_ml9dY7/pub?output=csv"

# Get sentiment body data and email time statistics
email_sentiment_data <- 
  read_csv(data_csv_file) %>%
  filter(sentiment_body != "N/A") %>%
  select(email_date,
         email_year,
         email_month,
         email_day,
         email_hour,
         sentiment_body) %>%
  rename(date = 1,
         year = 2,
         month = 3,
         day = 4,
         hour = 5) %>%
  slice(1:500)

email_sentiment_data

#################################
# MEAN DAY SENTIMENTALITY GRAPH #
#################################
# MEAN SENTIMENTALITY FOR EACH DAY
day_sentiment_data <- email_sentiment_data %>%
  group_by(day) %>%
  summarise(mean_sentiment_rounded = round(mean(sentiment_body), 3)) # ROUND TO THREE
# MAKE GRAPH VIA GGPLOT
day_sentiment_column_graph <- ggplot (data = day_sentiment_data) +
  geom_col(aes(x = day, y = mean_sentiment_rounded, fill = day)) +
  geom_text(aes(x = day, y = mean_sentiment_rounded, label = mean_sentiment_rounded), 
            vjust = ifelse(day_sentiment_data$mean_sentiment_rounded >= 0, -0.5, 1.5), 
            size = 3) + # GIVE SENITMENTALITY VALUE FOR EACH DAY
  coord_cartesian(ylim = c(-1, 1))+ # SENTIMENTALITY CAN BE NEGATIVE!!!!
  labs(title = "Sentiments by the Weekdays",subtitle = "Comparing the mean sentminent of email bodies by each day from my recent emails", x = "Day", y = "Mean Email Body Sentiment")
# Save graph using ggsave
ggsave("day_sentiment_column_graph.png", day_sentiment_column_graph, height = 6, width = 9, dpi = 320)


##################################
# MEAN HOUR SENTIMENTALITY GRAPH #
##################################
# MEAN SENTIMENTALITY FOR EACH HOUR
hour_sentiment_data <- email_sentiment_data %>%
  group_by(hour) %>%
  summarise(mean_sentiment_rounded = round(mean(sentiment_body), 3)) # ROUND TO THREE
# MAKE SENTIMENTALITY HOUR GRAPH
hour_sentiment_column_graph <- ggplot (data = hour_sentiment_data) +
  geom_col(aes(x = hour, y = mean_sentiment_rounded, fill = hour)) +
  geom_text(aes(x = hour, y = mean_sentiment_rounded, label = mean_sentiment_rounded), 
            vjust = ifelse(hour_sentiment_data$mean_sentiment_rounded >= 0, -0.5, 1.5), 
            size = 3) + # GIVE SENITMENTALITY VALUE FOR EACH HOUR
  coord_cartesian(ylim = c(-1, 1)) + # SENTIMENTALITY CAN BE NEGATIVE!!!!
  coord_cartesian(xlim = c(0, 23)) + # 24 HOURS, (INPUTTING 24 ADDED A 25'th HOUR FOR SOME REASON)
  scale_x_continuous(breaks = 0:23) + # SHOW EVERY HOUR ON X AXIS
  # There does not exist a 24th hour on the 24 hour clock
  labs(title = "Sentiments on the 24-hour Clock",subtitle = "Comparing the mean sentminent of email bodies by each hour of the day from my recent emails", x = "Hour", y = "Mean Email Body Sentiment")
# Save graph using ggsave
ggsave("hour_sentiment_column_graph.png", hour_sentiment_column_graph, height = 6, width = 9, dpi = 320)

###################
# HOUR PLOT GRAPH #
###################
hour_sentiment_plot_data <- email_sentiment_data %>%
  group_by(hour) %>%
  summarise(sentiment_rounded = round(sentiment_body, 3))
hour_sentiment_plot_graph <- ggplot(data = hour_sentiment_plot_data, aes(x = hour, y = sentiment_rounded)) + 
  geom_point() + 
# TOO MESSY, NOT GOOD TO LOOK AT
#  geom_text(aes(x = hour, y = sentiment_rounded, label = sentiment_rounded), 
#            vjust = ifelse(hour_sentiment_plot_data$sentiment_rounded >= 0, -0.5, 1.5), 
#            size = 3) + # GIVE SENITMENTALITY VALUE FOR EACH HOUR
  coord_cartesian(ylim = c(-1, 1)) + # SENTIMENTALITY CAN BE NEGATIVE!!!!
  coord_cartesian(xlim = c(0, 24)) + # 24 HOURS, (INPUTTING 24 ADDED A 25'th HOUR FOR SOME REASON)
  scale_x_continuous(breaks = 0:23) + # SHOW EVERY HOUR ON X AXIS
  # There does not exist a 24th hour on the 24 hour clock! (0 to 23 technically)
  labs(title = "Sentiments on the 24-hour Clock",subtitle = "Comparing the sentminent of email bodies by each hour of the day from my recent emails in a scatter plot", x = "Hour", y = "Email Body Sentiment")
plot(hour_sentiment_plot_graph)
ggsave("hour_sentiment_plot_graph.png", hour_sentiment_plot_graph, height = 6, width = 9, dpi = 320)

# Much better than just looking at the mean!

##################
# DAY PLOT GRAPH #
##################
day_sentiment_plot_data <- email_sentiment_data %>%
  group_by(day) %>%
  summarise(sentiment_rounded = round(sentiment_body, 3))
day_sentiment_plot_graph <- ggplot(data = day_sentiment_plot_data, aes(x = day, y = sentiment_rounded)) + 
  geom_point() + 
  coord_cartesian(ylim = c(-1, 1)) + # SENTIMENTALITY CAN BE NEGATIVE!!!!
  labs(title = "Sentiments by the Weekdays",subtitle = "Comparing the sentminent of email bodies by each day from my recent emails in a scatter plot", x = "Day", y = "Email Body Sentiment")
plot(day_sentiment_plot_graph)
ggsave("day_sentiment_plot_graph.png", day_sentiment_plot_graph, height = 6, width = 9, dpi = 320)

# Able to see that the email body sentiments are either rather positive 
# or rather negative. Seems to be almost no neutral sentiments for email bodies.

##########################
# HOUR-DAY RELATIONSHIP? #
##########################
# Get grouped by day 
hour_day_sentiment_data <- email_sentiment_data %>%
  group_by(hour, day) %>%
  reframe(sentiment_rounded = round(sentiment_body, 3))

hour_day_sentiment_plot_graph <- ggplot(data = hour_day_sentiment_data, aes(x = hour, y = sentiment_rounded, colour = day)) + 
  geom_point() + 
  coord_cartesian(ylim = c(-1, 1)) +
  coord_cartesian(xlim = c(0, 24)) +
  scale_x_continuous(breaks = 0:23) +
  labs(title = "Sentiments on the 24-hour Clock, Day by Day.",subtitle = "Comparing the sentminent of email bodies by each hour of the day, for the days in a week from my recent emails in a scatter plot", x = "Hour", y = "Email Body Sentiment")
plot(hour_day_sentiment_plot_graph)

animated_graph <- hour_day_sentiment_plot_graph + 
  transition_states(day,
                    transition_length = 2,
                    state_length = 1)
animated_graph
anim_save("animated_graph.gif", animated_graph)

########
# GRID #
########
# Top Images
day_col_img <- image_read("day_sentiment_column_graph.png")
hour_col_img <- image_read("hour_sentiment_column_graph.png")
column_vector <- c(day_col_img, hour_col_img)
top_image <- image_append(column_vector, stack = FALSE)
# Bottom Images
day_plot_img <- image_read("day_sentiment_plot_graph.png")
hour_plot_img <- image_read("hour_sentiment_plot_graph.png")
plot_vector <- c(day_plot_img, hour_plot_img)
bottom_image <- image_append(plot_vector, stack = FALSE)

four_grid_image <- image_append(c(top_image, bottom_image), stack = TRUE)
four_grid_image %>%
  image_write("four_grid_image.png")